Is there a benefit for anesthesiologists of adding difficult airway scenarios for learning fiberoptic intubation skills using virtual reality training? A randomized controlled study

Fiberoptic intubation for a difficult airway requires significant experience. Traditionally only normal airways were available for high fidelity bronchoscopy simulators. It is not clear if training on difficult airways offers an advantage over training on normal airways. This study investigates the added value of difficult airway scenarios during virtual reality fiberoptic intubation training. A prospective multicentric randomized study was conducted 2019 to 2020, among 86 inexperienced anesthesia residents, fellows and staff. Two groups were compared: Group N (control, n = 43) first trained on a normal airway and Group D (n = 43) first trained on a normal, followed by three difficult airways. All were then tested by comparing their ORSIM® scores on 5 scenarios (1 normal and 4 difficult airways). The final evaluation ORSIM® score for the normal airway testing scenario was significantly higher for group N than group D: median score 76% (IQR 56.5–90) versus 58% (IQR 51.5–69, p = 0.0039), but there was no difference in ORSIM® scores for the difficult intubation testing scenarios. A single exposure to each of 3 different difficult airway scenarios did not lead to better fiberoptic intubation skills on previously unseen difficult airways, when compared to multiple exposures to a normal airway scenario. This finding may be due to the learning curve of approximately 5–10 exposures to a specific airway scenario required to reach proficiency.


Introduction
The management of difficult airways is associated with significant mortality in both the Intensive Care Unit (ICU) and the operating room [1,2]. Airway management devices have evolved and guidelines in airway management have changed over the last decade [3][4][5][6][7] bronchoscopy (FB) remains an accepted first choice for patients with known difficult intubation or as a rescue technique for non-anticipated difficult intubations [8]. However, learning FB remains challenging because of its infrequent use which is further compounded by increasing use of video-laryngoscopes [4,[8][9][10][11]. Simulation appears to be an efficient tool for training health care providers using otherwise rare clinical scenarios [12][13][14]. Many studies have indeed shown the value of simulation for bronchoscopy training but none of them have studied its use during difficult airway management [15][16][17][18][19][20][21][22][23][24][25][26].
Until the recent development of a virtual reality fiberoptic simulator with multiple difficult airway scenarios, mostly normal airways were available for high fidelity bronchoscopy simulators [15,17,21,22,[27][28][29]. It is, however, not clear if training on difficult airway scenarios offers a significant advantage over training on normal airway scenarios [30].
We hypothesized that inexperienced anesthesia providers would improve by training on a virtual reality simulator with difficult airway scenarios when compared to training on normal airway scenarios (primary outcome: improved performance score, secondary outcome: success rates, times, collision avoidance and perceived level of difficulty).

Materials and methods
This study recruited anesthesia residents, fellows and staff physicians who had performed 10 or less fiberoptic intubations from June 2019 to March 2020. Participating centers were: Toulouse (France), Clermont-Ferrand (France), Coventry (United Kingdom) and Oxford (United Kingdom). The study protocol was not registered but communicated to all participating centers.

Consent
Each participating center obtained ethics approval through their local Research Ethics Committee. This study's protocol has been registered with the Comité d'Ethique pour les Recherches (CER) of Toulouse University under the number: 2018-099 for the centers in Toulouse and Clermont-Ferrand (France) and by the Health and Research Authority in Wales (Ref 19/HRA/5142) for the centers in Oxford and Coventry (United Kingdom). All participants gave written informed consent.

Virtual reality high fidelity simulator
The ORSIM 1 virtual reality bronchoscopy simulator (Airway Simulation Limited, Auckland, New Zealand) incorporates a replica video bronchoscope, desktop sensor module, and dedicated laptop computer. The ORSIM 1 contains software recreating a high-fidelity FB intubation scenario for the user. It includes learning modules on airway anatomy and dexterity and uses both normal and difficult airway intubation scenarios. (https://www.orsim.co.nz/ design).

ORSIM 1 score
Validity and reliability of the ORSIM 1 simulator as an assessment tool have been shown in a previous study by Baker et al. and relied on external scoring of a large number of video recording, which was not felt to be feasible for research [30]. Their results provided them the basis for establishing computer-generated metrics [30] and the development of a proprietary ORSIM 1 score. This ORSIM 1 Score (which has not been validated against expert scores) appears on the ORSIM 1 Session Results screen after the participant finishes a given scenario and was recorded to grade their performance.

Gopher-module
To assess pre-test dexterity of each participant, the 'Gopher' module, contained in the ORSIM 1 , was used: it consists of a challenge game to catch nine different gopher figures using the ORSIM 1 bronchoscope.

Data collection
Demographic data were collected including age, gender, dominant hand, years of residency training, previous FB experience, previous ORSIM 1 and video game experience. The ORSIM 1 simulator's data were collected including type of scenario, time to successful completion, collision avoidance percentage score and minimal oxygen saturation, ORSIM 1 scores for each scenario and the degree of difficulty felt for each of the scenarios as judged by the study subjects.

Study design
The participants recruitment, randomization and allocation are displayed in Fig 1. After obtaining written informed consent, a study form was completed. Participants then watched a video explaining the functionalities of the ORSIM 1 simulator and the flexible bronchoscope. A pre-test dexterity test (Gopher module) was done before and then again after five minutes of hands-on familiarization with the bronchoscope simulator. Afterwards participants were randomized (by random number drawing) into two groups: Group N trained on normal airway scenarios up to a maximum of 40 minutes and group D trained instead on one normal and then three difficult airway scenarios (a retropharyngeal abscess, an epiglottic cyst, and a macroglossia). Each of the participants in group D had up to 10 minutes per scenario before being prompted to move to the next scenario (i.e. up to a total of 40 minutes). Completion times, ORSIM 1 scores, collision avoidance, lowest oxygen saturation, perceived level of difficulty on a Visual Analogic Scale (VAS) from 0 (easy) to 100 (very difficult) were recorded.
The final step consisted of testing the participants on their fiberoptic difficult intubation skills by using one normal and four difficult scenarios (airway trauma, severe epiglottitis, false cord cyst and angioneurotic oedema). Each participant had to try to succeed once in a maximum of ten minutes for each scenario. Times, ORSIM 1 scores, collision avoidance, lowest oxygen saturation, perceived level of difficulty on a VAS from 0 (easy) to 100 (very difficult) were recorded. The investigator was not allowed to guide or help the participant to solve scenarios. The investigator's role was strictly to observe and keep time, as well as manage any technical difficulty with the ORSIM 1 simulator.

Primary outcome
ORSIM 1 scores for the final test scenarios were compared between the two groups for each scenario.

Secondary outcomes
Times and ORSIM 1 scores in normal airway scenarios before and after training were compared between the two groups. Successful rates, times, collision avoidance and perceived level of difficulty were also compared.

Statistical analyses
Redcap 1 software (Research Electronic Data Capture 1 ) was used to collect data [31,32]. R 1 software (R Foundation for Statistical Computing, Vienna, Austria) was used for the statistical analyses. Power of this study had been estimated by using the data from the previous ORSIM 1 study by Baker et al. [30]: 150 participants were needed to show a significant difference of 25% with a power of 90%. A Chi square test and Fisher's exact test were used to test differences between group N and D concerning categorial variables. A T-test (or Wilcoxon-Mann-Whitney for not normally distributed data) was used to test differences concerning continuous or ordinal variables. Statistical significance was taken as p<0.05. Results are expressed in median and interquartile ranges or in numbers and percentages.

Recruitment and exclusions
The study began in June 2019 and was stopped in March 2020 due to the Covid-19 pandemic. This analysis includes the 86 participants out of the 150 expected enrolments (43 in group N, 43 in group D) from Toulouse (n = 25) and Clermont-Ferrand (n = 25) in France, the University Hospitals Coventry and Warwickshire NHS Trust (n = 16) and Oxford University Hospitals NHS Foundation Trust (n = 20) in the UK. 105 participations had been recruited, but nine were excluded because of missing or invalid data and a further nine because of a protocol violation (they had already performed more than 10 fiberoptic intubations). Finally, one participant had been recruited twice: only the first of those attempts was kept for analysis. Table 1 shows the demographic data and Table 2 shows the pre-text dexterity. There was no difference in experience level with the fiberoptic bronchoscope or video game expertise between groups N and D and there was no difference in pre-test dexterity between the two groups.

Demographics and pre-test dexterity
Final ORSIM 1 score (primary outcome) Table 3 shows the comparison of both groups ORSIM 1 scores during the final testing (primary outcome). The final test ORSIM 1 score for the normal airway scenario was significantly higher for group N than group D: median score 76% (IQR 56.5-90) versus 58% (IQR 51.5-69) respectively (p = 0.0039). There was no statistically significant difference between the two groups for the final test ORSIM 1 score on any of the difficult intubation scenarios (Table 3).

Performance evolution on normal airways
During the first try, at the beginning of training, there was no difference between groups in their performance on the normal airway scenario for ORSIM 1 scores, time to succeed and collision avoidance. During the final testing on the normal airway scenario (after completion of the training) the ORSIM 1 score for group N was significantly higher and their times and estimated level of difficulty were significantly decreased when compared to Group D ( Table 4).

Comparison of perceived level of difficulty between groups during final testing
There was a significant difference of perceived level of difficulty on a VAS (0 to 100) between the two groups concerning the severe epiglottitis scenario as it appeared easier to group D than to group N: 66 (57-80) versus 79 (68-90) for the severe epiglottitis, p<0.05) ( Table 5).

Discussion
This study shows a significant performance improvement for fiberoptic intubations on normal airways for the group who trained only on normal airways. In contrast, training on a variety of difficult airways did not translate to performance improvement on (other) difficult airway scenarios or on a normal airway. Simulation appears as a new tool allowing learners to achieve a satisfactory skill level before practice. Graeser et al. [26] showed that simulator training allows for entry of the learning curve of airway management at a higher level. J.E Smith et al. [33] described a learning curve of 10 fiberoptic intubations to achieve a satisfactory competence and 30-50 fiberoptic intubations to reach expert level. Repetition of a same scenario seems to be the main determinant of learning. In our study, participants in group N had performed a median of 12 successive normal intubations before being tested (over a median of 15 minutes, Table 1), compared to only one single attempt in group D. In group D, each scenario could only be done once and participants in group D had thus performed the balance of their training on various difficult airways (over a median total of 14 minutes, Table 1). As our study shows, that training did not translate into better performance on the normal airway scenario. In their study on validating the ORSIM 1 [30], Baker et al. proposed a score of 70 to characterize an expert level. Our study shows a median ORSIM 1 score of 76% after the 12 attempts in the final testing phase for group N on the normal airway, suggesting an satisfactory competence, or even expert level had been acquired after their repetitive training with this same normal airway scenario. On the other hand, for group D, the median ORSIM 1 score for each scenario remained well below 70%. It would be interesting to see if the ORSIM 1 score during final testing for group D would have improved significantly if Group D also would have trained until achieving 'expert level' (score of at least 70%) on each of their difficult training scenarios, but our experimental design did not allow to study that (and would have likely led   to group D training for a much longer total time compared to group N, thus introducing a confounder).
The main finding of our study is thus that the learning may initially be limited to the actual scenario and the degree with which learning extends to other scenarios remains unclear (but clinically important). It may be necessary to introduce repetitive training sessions of each difficult scenario before the full benefits would become apparent.
Although training on a variety of difficult airways for group D did not translate to performance improvement, it did seem to lead to a reduction of perceived difficulty level some scenarios (for the severe epiglottitis scenario). Similarly, acquisition of expertise level for group N significantly reduced the perception of difficulty on the normal airway scenario. In practice, we can often observe that stress and perception of difficulty can have a negative impact on performance-and training is probably the only way to remedy this [34]. Our study shows that training on difficult airway scenarios could indeed reduce the stress during subsequent attempts on difficult airway scenarios.
Yang et al. [35] showed that psychometric skills can predict the acquisition of procedural skill performance and Louridas et al. [36], looking at laparoscopic surgery, investigated the value of psychometric testing to predict the technical performance of new residents. It would be interesting to study the relationship between dexterity and time to reach an expert level for fiberoptic intubation skills for a 'competence by design' approach to teach fiberoptic intubation.

Limitations of the study
Due to the unforeseen circumstances posed by COVID-19 we were forced to halt trial recruitment and decided to perform an unplanned interim analysis of the results. We put in place a plan for a single interim analysis with a symmetrical stopping rule for a p< 0.002 using the Pocock rule. When we identified a p = 0.0039 for our primary outcome in the direction of improved outcomes for the normal group we decided to stop the trial based on futility. While the number of participants in our study was smaller than planned, the magnitude of the estimated effect sizes was consistently small. Furthermore, the group trained on the difficult scenarios were as likely to show worse results as they were to show improved results. For these two reasons we postulate that the inclusion of more participants would not change the primary outcome.
Some participants in both groups had already used the ORSIM 1 simulator, however the numbers were small and also well distributed across both study groups, thus unlikely to create a bias. While it was originally contemplated to exclude participants with previous ORSIM 1 experience, it was felt that this would unnecessarily restrict the recruitment in view of the fact that the use of ORSIM 1 simulation was expanding rapidly.
In our study, we only assessed the level 2 of Kirkpatrick's model of training evaluation: the testing of skills based on training [37]. It will be important to investigate actual changes in clinical practice (level 3) and ultimately changes in patient outcome (level 4) when evaluating the role of the ORSIM 1 simulator. The latter steps present significant logistical and ethical challenges and may involve the creation of actual patient airway through 3 D printer technology as an interim step.

Conclusion
This study was not able to demonstrate a benefit of adding difficult airway scenarios for learner's performance during intubation training with a replica flexible bronchoscope. It only showed that repeated training on the ORSIM 1 simulator with the same scenario (a normal airway) led to increased intubation skills for the normal airway and that this skill was specific for the normal airway scenario. Training on normal airways did not predict better skills on difficult airways (and vice versa).
Subsequent studies should focus on the acquisition of a level of expertise in difficult airways and its reproducibility from one difficult airway scenario to another in order to validate this simulation model and better prepare the learner for clinical practice.